Bandit Linear Optimization for Sequential Decision Making and Extensive-Form Games
نویسندگان
چکیده
Tree-form sequential decision making (TFSDM) extends classical one-shot by modeling tree-form interactions between an agent and a potentially adversarial environment. It captures the online decision-making problems that each player faces in extensive-form game, as well Markov processes partially-observable where conditions on observed history. Over past decade, there has been considerable effort into designing optimization methods for TFSDM. Virtually all of work full-feedback setting, access to counterfactuals, is, information what would have happened had chosen different action at any node. Little is known about bandit assumption reversed (no counterfactual available), despite this latter setting being understood almost 20 years making. In paper, we give first algorithm linear problem TFSDM offers both (i) linear-time iterations (in size tree) (ii) O(sqrt(T)) cumulative regret expectation compared fixed strategy, times T. This made possible new results derive, which may independent uses well: 1) geometry dilated entropy regularizer, 2) autocorrelation matrix natural sampling scheme sequence-form strategies, 3) construction unbiased estimator losses 4) refined analysis mirror descent when using regularizer.
منابع مشابه
Decision Support for Extensive Form Negotiation Games
This paper presents a tool, NEGEXT, for finding individual and group strategies to achieve certain goals while playing extensive form negotiation games. NEGEXT is used as a model-checking tool which investigates the existence of strategies in negotiation situations. We consider sequential and parallel combinations of such games also. Thus, it may aid students of negotiation in their understandi...
متن کاملDistributionally Robust Optimization for Sequential Decision Making
The distributionally robust Markov Decision Process approach has been proposed in the literature, where the goal is to seek a distributionally robust policy that achieves the maximal expected total reward under the most adversarial joint distribution of uncertain parameters. In this paper, we study distributionally robust MDP where ambiguity sets for uncertain parameters are of a format that ca...
متن کاملA Unification of Extensive-Form Games and Markov Decision Processes
We describe a generalization of extensive-form games that greatly increases representational power while still allowing efficient computation in the zero-sum setting. A principal feature of our generalization is that it places arbitrary convex optimization problems at decision nodes, in place of the finite action sets typically considered. The possibly-infinite action sets mean we must “forget”...
متن کاملExtensive-Form Argumentation Games
Two prevalent approaches to automated negotiation are the application of game-theoretic notions and the argumentation-based angle; these two schemes are frequently at odds. An elegant view of argumentation is Dung’s abstract argumentation theory [2], which cold-shoulders the internal structure of arguments in favor of the entire debate’s global structure. Dung’s theory is elaborated by work in ...
متن کاملExtensive Form Games
I. Review of the Notation of Extensive Form Game Theory A. Let T be a finite set of nodes, thought of as states of the game where uncertainty is resolved, a choice is made, or, in the case of terminal nodes, where the game is over and payoffs are realized. 1. The nodes in T are partially ordered by a relation called precedence. 2. Formally the precedence relation is ≺ ⊂ T × T . 3. We assume tha...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Proceedings of the ... AAAI Conference on Artificial Intelligence
سال: 2021
ISSN: ['2159-5399', '2374-3468']
DOI: https://doi.org/10.1609/aaai.v35i6.16677